Type-Checking Regions
Because of recursive functions, there can be any number of live regions at run time. The compiler uses the following general strategy to ensure that only pointers into live regions are dereferenced:
- Use compile-time region names. Syntactically these are just type variables, but they are used differently.
- Decorate each pointer type and handle type with one or more region names.
- Decorate each program point with a (finite) set of region names. We call the set the point’s capability.
- To dereference a pointer (via
*
, ->, or subscript), each region name that appears in the pointer’s type name must be in the program point’s capability. Similarly, to use a handle for allocation, the handle type’s region name must be in the capability. - Enforce a type system such that the following is impossible: A
program point P’s capability contains a region name
`r
that decorates a pointer (or handle) expression expr that, at run time, points into a region that has been deallocated and the operation at P dereferences expr.
This strategy is probably too vague to make sense at this point, but it may help to refer back to it as we explain specific aspects of the type system.
Note that in the rest of the documentation (and in common parlance) we abuse the word “region” to refer both to region names and to run-time collections of objects. Similarly, we confuse a block of declarations, its region-name, and the run-time space allocated for the block. (With loops and recursive functions, “the space allocated” for the block is really any number of distinct regions.) But in the rest of this section, we painstakingly distinguish region names, regions, etc.
Region Names
Given a function, we associate a distinct region name with each program point that creates a region, as follows:
- If a block (blocks create stack regions) has label L, then the
region-name for the block is
`L
. - If a block has no label, the compiler makes up a fresh region-name for the block.
- In region r <
`foo
> s, the region-name for the construct is`foo
. - In region r s, the region-name for the construct is
`r
. - In region h = open(k) s, the region-name for the construct is
`r
, assuming k has typeregion_key_t<`r,_>
(where _ is some other region name of no consequence).
The region name for the heap is `H
. Region names associated with
program points within a function should be distinct from each other,
distinct from any region names appearing in the function’s prototype,
and should not be `H
. (So you cannot use H as a label name, for
example.) Because the function’s return type cannot mention a region
name for a block or region-construct in the function, it is impossible
to return a pointer to deallocated storage.
In region r <`r> s
, region r s
, and region r = open(k) s
the type of r is region_t<`r>
(assuming, that k has type
region_key_t<`r,_>
). In other words, the handle is decorated with the
region name for the construct. Pointer types’ region names are
explicit, although you generally rely on inference to put in the
correct one for you.
Capabilities
In the absence of explicit effects (see below), the capability for a program point includes exactly:
`H
- The effect corresponding to the function’s prototype. Briefly, any
region name in the prototype (or inserted by the compiler due to
an omission) is in the corresponding effect. Furthermore, for each
type variable
`a
that appears (or is inserted), “regions(`a
)” is in the corresponding effect. This latter effect roughly means, “I don’t know what`a
is, but if you instantiate with a type mentioning some regions, then add those regions to the effect of the instantiated prototype.” This is necessary for safely type-checking calls that include function pointers. - The region names for the blocks and “region r s” statements that contain the program point
For each dereference or allocation operation, we simply check that the region name for the type of the object is in the capability. It takes extremely tricky code (such as existential region names) to make the check fail.
Type Declarations
Each pointer and region handle type is decorated with a set of region
names. This set of region names is referred to as the store or effect
qualifier of a type. For instance a pointer type might be
int *`r+`H
. This indicates the type of pointer to an integer that resides
in the region named `r
or the heap region, `H
. Similarly, a region
handle type might be region_t<`r1+`r2>
which indicates a handle to
region named `r1
or `r2
.
A struct, typedef, or datatype declaration may be parameterized by any number of effect qualifiers. The store qualifiers are placed in the list of type parameters. They may be followed by their kind; i.e. “::E”. For example, given
struct List<`a,`r>{`a hd; struct List<`a,`r> *`r tl;};
the type struct List<int,`H>
is for a list of ints in the heap,
while the type struct List<int,`l>
is for a list of ints in some
lexical region. Notice that all of the “cons cells” of the List will
be in the same region (the type of the tl field uses the same region
name `r
that is used to instantiate the recursive instance of
struct List<`a,`r>)
. However, we could instantiate `a
with a pointer type that has a different region name.
Subtyping and Effect Qualifiers
A pointer type’s effect qualifier is part of the type. If e1 and e2 are pointers, then e1 = e2 is well-typed only if all the regions names mentioned in the the effect qualifier for e2’s type also appears in the effect qualifier of e1’s type. For instance, both assignments to b below are legal, while the assignment to a is not.
void foo(int *`r a) {
int *`r+`H b = a;
if(!a) b = new 1;
a = b;
}
The store qualifier in the type of b indicates that it can point into
the region named `r
or into the heap region. Therefore, initializing b
with a pointer into `r
is certainly consistent with its store
qualifier. Similarly, the second assignment to b is legal since it is
updated to point to the heap region. However, the assignment to a is
not permitted since the declared type of a claims that it is pointer
into the region named `r
alone and b may actually point to the heap.
For handles, if `r
is a region name, there is at most one value
of type region_t<`r>
(there are 0 if `r
is a block’s name), so
there is little use in creating variables of type
region_t<`r>
. However, it could be useful to create variables of type
region_t<`r1+`r2>
. This is the type of handle to either the region
named `r1
or to the region named `r2
. This is illustrated by the
following code:
void foo(int a, region_t<`r> h) {
region_t<`r+`H> rh = h;
if(a) {
rh = heap_region;
}
}
As always, this form of subtyping for effect qualifiers is not permitted under a reference. Thus, the assignment in the program below is not legal.
void foo(int *`H *`r a) {
int *`r+`H *`r b = a;
}
Instead of the shorthand int *`r1+`r2
the more verbose int *@effect(`r1 + `r2)
may
also be used. The @effect
qualifier can be used in place of the @region
qualifier.
Function Calls
If a function parameter or result has type int *`r
or
region_t<`r>
, the function is polymorphic over the region name
`r
. That is, the caller can instantiate `r with any region in the
caller’s current capability as long as the region has the correct
kind. This instantiation is usually implicit, so the caller just calls
the function and the compiler uses the types of the actual arguments
to infer the instantiation of the region names (just like it infers
the instantiation of type variables).
The callee is checked knowing nothing about `r
except that it is
in its capability (plus whatever can be determined from explicit
outlives assumptions), and that it has the given kind. For example, it
will be impossible to assign a parameter of type int*`r
to a
global variable. Why? Because the global would have to have a type
that allowed it to point into any region. There is no such type
because we could never safely follow such a pointer (since it could
point into a deallocated region).
Explicit and Default Effects
If you are not using existential types, you now know everything you need to know about Cyclone regions and memory management. Even if you are using these types and functions over them (such as the closure library in the Cyclone library), you probably don’t need to know much more than “provide a region that the hidden types outlive”.
The problem with existential types is that when you “unpack” the type, you no longer know that the regions into which the fields point are allocated. We are sound because the corresponding region names are not in the capability, but this makes the fields unusable. To make them usable, we do not hide the capability needed to use them. Instead, we use a region bound that is not existentially bound.
If the contents of existential packages contain only heap pointers, then `H is a fine choice for a region bound.
These issues are discussed in Advanced Features.